AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)

Neural Information Processing SystemsFeb-15-2026, 13:59:45 GMT

Smoothed Online Learning for Prediction in Piecewise Affine Systems

PW A systems allow for discontinuities across the separate regions ("pieces"), and are thus a simplified

algorithm, artificial intelligence, machine learning, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey (0.04)

Industry: Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Neural Information Processing SystemsFeb-8-2026, 19:30:34 GMT

1ac83203e88eb6cf6b30642f0239b932-Paper-Conference.pdf

Our analysis reveals that the Polyak step-size adapts toany directional smoothness to obtain the tightest possible convergence rate.

artificial intelligence, machine learning, smoothness, (18 more...)

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(6 more...)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-23-2025

Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis

Lyu, Bochen, Zhang, Xiaojing, Zheng, Fangyi, Wang, He, Wang, Zheng, Zhu, Zhanxing

This paper establishes a continuous time approximation, a piece-wise continuous differential equation, for the discrete Heavy-Ball (HB) momentum method with explicit discretization error. Investigating continuous differential equations has been a promising approach for studying the discrete optimization methods. Despite the crucial role of momentum in gradient-based optimization methods, the gap between the original discrete dynamics and the continuous time approximations due to the discretization error has not been comprehensively bridged yet. In this work, we study the HB momentum method in continuous time while putting more focus on the discretization error to provide additional theoretical tools to this area. In particular, we design a first-order piece-wise continuous differential equation, where we add a number of counter terms to account for the discretization error explicitly. As a result, we provide a continuous time model for the HB momentum method that allows the control of discretization error to arbitrary order of the step size. As an application, we leverage it to find a new implicit regularization of the directional smoothness and investigate the implicit bias of HB for diagonal linear networks, indicating how our results can be used in deep learning. Our theoretical findings are further supported by numerical experiments.

artificial intelligence, machine learning, optimization problem, (20 more...)

2506.14806

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Sports > Tennis (0.60)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Neural Information Processing SystemsOct-9-2025, 20:03:12 GMT

1ac83203e88eb6cf6b30642f0239b932-Paper-Conference.pdf

directional smoothness, equation, smoothness, (16 more...)

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-8-2025, 23:58:37 GMT

82096f4f6f897529ecd3eabea603e9cc-Paper-Conference.pdf

algorithm, artificial intelligence, machine learning, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey (0.04)

Industry: Education > Educational Setting > Online (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceMar-6-2024

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

Mishkin, Aaron, Khaled, Ahmed, Wang, Yuanhao, Defazio, Aaron, Gower, Robert M.

One way to avoid global smoothness of f is to use local Lipschitz continuity of the gradient ("local smoothness"). Local We develop new sub-optimality bounds for gradient smoothness uses different Lipschitz constants for different descent (GD) that depend on the conditioning neighbourhoods, thus avoiding global assumptions and obtaining of the objective along the path of optimization, improved rates. However, such analyses typically require rather than on global, worst-case constants. Key the iterates to be bounded, in which case local smoothness to our proofs is directional smoothness, a measure reduces to L-smoothness over a compact set (Malitsky of gradient variation that we use to develop upperbounds & Mishchenko, 2020). Boundedness can be enforced in a on the objective. Minimizing these upperbounds variety of ways: Zhang & Hong (2020) break optimization requires solving implicit equations to obtain into stages, Patel & Berahas (2022) develop a stopping-time a sequence of strongly adapted step-sizes; framework, and Lu & Mei (2023) use line-search and a modified we show that these equations are straightforward update. These approaches either modify the underlying to solve for convex quadratics and lead to new optimization algorithm, require local smoothness oracles guarantees for two classical step-sizes. For general (Park et al., 2021), or rely on highly complex arguments.

directional smoothness, equation, smoothness, (13 more...)

2403.04081

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(8 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-2-2023

Linear attention is (maybe) all you need (to understand transformer optimization)

Ahn, Kwangjun, Cheng, Xiang, Song, Minhak, Yun, Chulhee, Jadbabaie, Ali, Sra, Suvrit

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training transformers by carefully studying a simple yet canonical linearized shallow transformer model. Specifically, we train linear transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized transformer model could actually be a valuable, realistic abstraction for understanding transformer optimization.

linear transformer, transformer, transformer optimization, (11 more...)

2310.01082

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Block, Adam, Rakhlin, Alexander, Simchowitz, Max

Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making

arXiv.org Artificial IntelligenceFeb-10-2023

The online learning setting has become the most popular regime for studying sequential decision making with dependent and potentially adversarial data. While this paradigm is attractive due to its great generality and minimal set of assumptions [Cesa-Bianchi and Lugosi, 2006], the worstcase nature of the adversary creates statistical and computational challenges [Rakhlin et al., 2015, Littlestone, 1988, Hazan and Koren, 2016]. In order to mitigate these difficulties, Rakhlin et al. [2011] proposed the smoothed setting, wherein the adversary is constrained to sample data from a distribution whose likelihood ratio is bounded above by 1/σ with respect to a fixed dominating measure, which ensures that the adversary cannot choose worst-case inputs with high probability. As in other online learning settings, performance is measured via regret with respect to a best-inhindsight comparator [Cesa-Bianchi and Lugosi, 2006]. Recent works have demonstrated strong computational-statistical tradeoffs in smoothed online learning: while there are statisticaly efficient algorithms that can enjoy regret logarithmic in 1/σ, oracle-efficient algorithms necessarily suffer regret scaling polynomially in 1/σ [Haghtalab et al., 2022a,b, Block et al., 2022], where the learner is assumed access to an Empirical Risk Minimization (ERM) oracle that is able to efficiently optimize functionals on the parameter space. This gap is significant, because in many applications of interest, the natural scaling of σ is exponential in ambient problem dimension [Block and Simchowitz, 2022]. A natural question remains: under which types of smoothing is it possible to design oracleefficient algorithms with regret that scales polynomially in problem dimension? A partial answer was provided by Block and Simchowitz [2022], who demonstrate an efficient algorithm based on the John Ellipsoid which attains log(T/σ) poly(dimension)-regret for noiseless linear classification, and for a suitable generalization to classification with polynomial features.

artificial intelligence, inequality follow, machine learning, (17 more...)

2302.0543

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Block, Adam, Simchowitz, Max, Tedrake, Russ

Smoothed Online Learning for Prediction in Piecewise Affine Systems

arXiv.org Artificial IntelligenceJan-26-2023

The problem of piecewise affine (PWA) regression and planning is of foundational importance to the study of online learning, control, and robotics, where it provides a theoretically and empirically tractable setting to study systems undergoing sharp changes in the dynamics. Unfortunately, due to the discontinuities that arise when crossing into different ``pieces,'' learning in general sequential settings is impossible and practical algorithms are forced to resort to heuristic approaches. This paper builds on the recently developed smoothed online learning framework and provides the first algorithms for prediction and simulation in PWA systems whose regret is polynomial in all relevant problem parameters under a weak smoothness assumption; moreover, our algorithms are efficient in the number of calls to an optimization oracle. We further apply our results to the problems of one-step prediction and multi-step simulation regret in piecewise affine dynamical systems, where the learner is tasked with simulating trajectories and regret is measured in terms of the Wasserstein distance between simulated and true data. Along the way, we develop several technical tools of more general interest.

artificial intelligence, machine learning, probability, (16 more...)

2301.11187

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.81)
(2 more...)